Rethinking detection based table structure recognition for visually rich document images
Keywords
1. Introduction
Fig. 1. Flowchart of a detection-based TSR solution.
1.1. Research objectives
1.2. Contributions
- 1.We comprehensively revisit existing detection-based TSR models and explore possible reasons hindering the performance of these models, including the improper problem formulation, the mismatch issue of detection metrics and TSR metrics, the inherent characteristics of detection models, and the impact of feature extraction. Our analysis and findings can be a guideline for further improving the performance of detection-based TSR models.
- 2.Based on our analysis and findings, we apply three simple methods to improve Cascade R-CNN, including proposing a pseudo-class generation method to transform multi-label detection into a regular single-label detection problem, adjusting the ratio aspects and the number of regional proposals in the region proposal generation, applying the deformable convolution and introducing a Spatial Attention Module to build the long-range dependencies and context information in the backbone network.
- 3.We conduct extensive experiments to evaluate our proposed solution on various datasets, including SciTSR (Chi et al., 2019), FinTabNet (Zheng et al., 2021), PubTabNet (Zhong et al., 2020) and PubTables1M (Smock et al., 2022) with both detection metrics and cell-level TSR metrics. The experimental results show that our proposed solution can outperform state-of-the-art models in terms of detection and cell-level TSR metrics.
- 4.We further verify our analysis and findings with experiments and discuss and summarize valuable insights from the experimental results for further model design.
1.3. Article structure
2. Related work
2.1. Object detection models
2.2. Table structure recognition
3. Rethinking detection-based TSR models
3.1. Preliminaries
3.1.1. Cascade R-CNN
Fig. 2. Overall architecture of Cascade R-CNN.
Fig. 3. Overall architecture of Sparse R-CNN.
3.1.2. Sparse R-CNN
Fig. 4. Different problem formulations for the detection-base TSR.
3.2. Rethinking problem formulations
Table 1. Comparisons of different problem formulations.
| Study | Detection Targets | Metrics | Outputs | Issues |
|---|---|---|---|---|
| Siddiqui et al. (2019) | Row/column | Precision | Regular cells | Information |
| Hashmi et al. (2021) | Recall/F1 | Loss | ||
| Fernandes et al. (2023) | Regular row/column | F1/TEDS | Regular cells | Information |
| Irregular row/column | Spanning cells | Loss | ||
| Xiao, Akkaya et al. (2022) | Table/column | COCO | Regular cells | Information |
| Row/spanning cell | Spanning cells | Loss | ||
| Smock et al. (2022) | Table/column | Header cells | ||
| Row/spanning cell | COCO | Spanning cells | Multi-label | |
| Column header | GriTS | Projected row header | Detection | |
| Projected row header | Regular cells | |||
| This study | Table/column | Header cells | ||
| Row/spanning cell | COCO | Spanning cells | – | |
| Column header | TEDS | Projected row header | ||
| Projected row header | Regular cells |
Fig. 5. Statistics of aspect ratio values of COCO and FinTabNet training sets. When an aspect ratio is less than 1, its multiplicative inverse counts the number of aspect ratios.
3.3. Revisiting region proposal generation
3.4. Rethinking detection and TSR metrics
Fig. 6. A sample from the FinTabNet dataset with ground truth boxes larger than the minimum bounding boxes for table structure. We only show the annotations of Columns for simplicity.
Fig. 7. A sample from the FinTabNet dataset. We only show its Row annotations for simplicity. The first Row in this Figure contains three major parts numbered 1 to 3.
3.5. Rethinking feature extraction
Fig. 8. Examples of our proposed problem formulation. Since the definitions of Table, Column, and Spanning Cells are same with PubTables1M, only Row, Column Header and Projected Row Header are showed for simplicity.
4. Proposed method
4.1. Proposed problem formulation
4.2. Tuning parameters of RPN
4.3. Spatial attention and deformable convolution
Fig. 9. Architecture of proposed Spatial Attention Module. A ResNet backbone consists of a STEM Block and four stages of Residual Block. Our proposed Spatial Attention Module are inserted between the blocks of the backbone to build long dependencies.
5. Experiments
5.1. Datasets and experimental settings
Table 2. Summary of datasets.
| Dataset | Train | Validation | Test |
|---|---|---|---|
| SciTSR (Chi et al., 2019) | 7,453 | 1,034 | 2,134 |
| FinTabNet (Zheng et al., 2021) | 78,537 | 9,650 | 9,289 |
| PubTabNet (Zhong et al., 2020) | 500,777 | 9,115 | – |
| PubTables1M (Smock et al., 2022) | 758,849 | 94,959 | 93,834 |
5.2. Implementation details and experimental results
Table 3. Key training parameters of the proposed model. MAX_ITER and STEPS are for the FinTabNet dataset as examples.
| Parameter | Value | Description |
|---|---|---|
| RESNETS.NORM | nnSyncBN | Batch Normalization for the Backbone Network |
| MAX_ITER | 112,500 | Total number of mini-batch |
| STEPS | 84,375 | The mini-batch to apply the learning rate schedule |
| SCHEDULER | MultiStepLR | The scheduler to change the learning rate |
| NMS_THRESH | 0.9 | Non-maximum suppression threshold |
| PRE_NMS_TOPK_TRAIN | 4000 | RPN proposals to keep before applying NMS in training |
| PRE_NMS_TOPK_TEST | 2000 | RPN proposals to keep before applying NMS in testing |
| POST_NMS_TOPK_TRAIN | 4000 | RPN proposals to keep after applying NMS in training |
| POST_NMS_TOPK_TEST | 2000 | RPN proposals to keep after applying NMS in testing |
| DEFORM_ON_PER_STAGE | [True, True, True, True] | Whether to use deformable convolution in backbone stages |
Table 4. Experimental results on the SciTSR, FinTabNet and PubTables1M datasets with structure-only TEDS score. means the tables without spanning cells and represents the tables with spanning cells.
| Dataset | Model | TEDS-struc.(%) | ||
|---|---|---|---|---|
| Empty Cell | Empty Cell | Sim. | Com. | All |
| SciTSR | Cascade R-CNN | 77.31 | 84.74 | 79.09 |
| Deformable-DETR | 98.17 | 94.59 | 97.30 | |
| Sparse R-CNN | 95.92 | 98.30 | ||
| TSRDet(Ours) | 98.59 | |||
| FinTabNet | Cascade R-CNN | 82.17 | 92.50 | 87.49 |
| Deformable-DETR | 98.08 | 97.54 | 97.81 | |
| Sparse R-CNN | 98.36 | 97.91 | 98.13 | |
| TSRDet(Ours) | ||||
| PubTables1M | Cascade R-CNN | 82.73 | 85.21 | 83.78 |
| Deformable-DETR | 97.54 | 93.14 | 95.73 | |
| Sparse R-CNN | 99.04 | 95.90 | 97.72 | |
| TSRDet(Ours) | ||||
Table 5. Experimental results with Mean Average Precision (mAP).
| Dataset | Model | Table | Column | Row | Spanning cell | Projected row header | Column header | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SciTSR | Cascade R-CNN | 93.89 | 95.27 | 94.80 | 95.81 | 93.89 | 92.96 | 98.96 | 98.63 | 96.33 | 88.58 | 83.80 | 97.01 |
| Deformable-DETR | 96.28 | 97.39 | 97.01 | 96.75 | 96.55 | 96.07 | 98.96 | 98.63 | 97.26 | 93.84 | 90.86 | 98.15 | |
| Sparse R-CNN | 94.78 | 96.17 | 95.48 | 95.49 | 95.07 | 90.08 | 98.98 | 98.30 | 97.93 | 88.06 | 86.92 | 98.49 | |
| TSRDet(Ours) | 96.28 | 96.79 | 96.57 | 99.01 | 96.42 | 95.65 | 98.97 | 99.25 | 98.57 | 95.30 | 87.06 | 98.50 | |
| FinTabNet | Cascade R-CNN | 95.23 | 97.53 | 96.90 | 87.32 | 95.31 | 93.08 | 99.00 | 96.69 | 96.96 | 84.43 | 96.63 | 97.64 |
| Deformable-DETR | 96.68 | 98.42 | 97.98 | 75.17 | 95.53 | 95.58 | 99.00 | 97.55 | 96.95 | 91.91 | 96.62 | 98.04 | |
| Sparse R-CNN | 96.38 | 98.37 | 97.69 | 62.11 | 96.22 | 95.86 | 99.01 | 97.79 | 97.84 | 88.39 | 97.29 | 97.97 | |
| TSRDet(Ours) | 97.50 | 98.33 | 98.09 | 91.60 | 97.40 | 97.15 | 99.01 | 98.83 | 97.99 | 94.62 | 96.61 | 97.93 | |
| PubTables1M | Cascade R-CNN | 93.40 | 95.38 | 94.76 | 85.75 | 93.32 | 92.57 | 99.01 | 98.76 | 87.56 | 82.18 | 95.81 | 97.11 |
| Deformable-DETR | 94.82 | 97.43 | 96.79 | 78.33 | 92.55 | 94.48 | 98.99 | 97.89 | 95.84 | 85.04 | 95.43 | 95.74 | |
| Sparse R-CNN | 96.46 | 98.14 | 97.60 | 84.25 | 95.73 | 96.45 | 99.00 | 98.42 | 98.03 | 87.85 | 97.91 | 97.57 | |
| TSRDet(Ours) | 97.72 | 98.26 | 98.04 | 94.76 | 97.43 | 97.33 | 99.01 | 98.99 | 98.41 | 94.21 | 97.88 | 97.85 |
Fig. 10. A sample of prediction result from the FinTabNet testing set.
5.3. Comparison with non-detection-based models
Table 6. Experimental results on the FinTabNet dataset with structure-only TEDS score. means the tables without spanning cells and represents the tables with spanning cells.
| Model | TEDS-struc.(%) | ||
|---|---|---|---|
| Empty Cell | Sim. | Com. | All |
| EDD (Zhong et al., 2020) | 88.40 | 92.08 | 90.60 |
| TableFormer (Nassar et al., 2022) | 97.50 | 96.00 | 96.80 |
| TableMaster (Ye et al., 2021) | 98.36 | 98.28 | 98.32 |
| VAST (Huang et al., 2023) | – | – | 98.63 |
| MTL-TabNet (Ly & Takasu, 2023) | 99.07 | 98.46 | 98.79 |
| TSRFormer-DQ-DETR (Wang, Lin et al., 2023) | – | – | 98.40 |
| TSRDet(Ours) | |||
Table 7. Experimental results on PubTabNet validation set with structure-only TEDS score. means the tables without spanning cells and represents the tables with spanning cells. The proposed model is trained with PubTable1M dataset, while the benchmark models are trained with PubTabNet dataset.
| Model | TEDS-struc.(%) | ||
|---|---|---|---|
| Empty Cell | Sim. | Com. | All |
| EDD (Zhong et al., 2020) | 91.10 | 88.70 | 89.90 |
| RobustTabNet (Ma et al., 2023) | – | – | 97.00 |
| TSRNet (Li, Yin et al., 2022) | – | – | 95.64 |
| VAST (Huang et al., 2023) | – | – | 97.23 |
| TableFormer (Nassar et al., 2022) | 98.50 | 95.00 | 96.75 |
| MTL-TabNet (Ly & Takasu, 2023) | 99.05 | 96.66 | 97.88 |
| TSRDet(Ours) | 96.99 | 94.99 | 96.58 |
5.4. Ablation study
Table 8. Ablation study results on FinTabNet dataset with structure-only TEDS score. Asp_Ratio Tuning, Single_Label, DEFORM, and S_Attn are shorts for applying aspect ratio tuning, single-label formulation, deformable convolution, and spatial attention.
| Model | Asp_Ratio tuning | Single_Label | DEFORM | S_Attn | TEDS-struc.(%) | ||
|---|---|---|---|---|---|---|---|
| Empty Cell | Empty Cell | Empty Cell | Empty Cell | Empty Cell | Sim. | Com. | All |
| Cascade R-CNN | 82.17 | 92.50 | 87.49 | ||||
| Ablation 1 | ✓ | 81.45 | 87.11 | 84.35 | |||
| Ablation 2 | ✓ | 84.27 | 95.80 | 90.23 | |||
| Ablation 3 | ✓ | ✓ | 95.17 | 98.63 | 96.95 | ||
| Ablation 4 | ✓ | ✓ | ✓ | 96.44 | 99.14 | 97.83 | |
| Ablation 5 | ✓ | ✓ | ✓ | 96.95 | 98.75 | 97.88 | |
| TSRDet(Ours) | ✓ | ✓ | ✓ | ✓ | 99.08 | 99.02 | 99.05 |
Table 9. Ablation study results regarding mean Average Precision (mAP). The model names are aligned with models in Table 8.
| Model | Table | Column | Row | Spanning Cell | Projected row header | Column header | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Cascade R-CNN | 95.23 | 97.53 | 96.90 | 87.32 | 95.31 | 93.08 | 99.00 | 96.69 | 96.96 | 84.43 | 96.63 | 97.64 |
| Ablation 1 | 97.22 | 98.03 | 97.90 | 90.11 | 96.72 | 96.76 | 99.00 | 98.95 | 96.16 | 94.98 | 96.01 | 98.19 |
| Ablation 2 | 95.54 | 97.54 | 96.91 | 87.43 | 95.79 | 94.04 | 99.00 | 97.04 | 97.64 | 84.84 | 96.67 | 98.02 |
| Ablation 3 | 95.51 | 97.56 | 96.94 | 88.43 | 95.52 | 93.76 | 99.00 | 97.31 | 97.87 | 84.74 | 96.82 | 97.28 |
| Ablation 4 | 97.83 | 98.37 | 98.13 | 91.91 | 97.65 | 97.58 | 99.00 | 98.96 | 98.33 | 95.78 | 96.98 | 97.93 |
| Ablation 5 | 96.97 | 97.84 | 97.58 | 90.32 | 96.88 | 96.21 | 99.00 | 98.83 | 98.03 | 91.97 | 96.58 | 97.37 |
| TSRDet(Ours) | 97.50 | 98.33 | 98.09 | 91.60 | 97.40 | 97.15 | 99.01 | 98.83 | 97.99 | 94.62 | 96.61 | 97.93 |
6. Discussions and analysis
6.1. Multi-label detection
6.2. The misalignment of metrics
Fig. 11. Comparison of results from Ablation1 and Ablation3 models. Even though Ablation 1 can achieve better detection performance, its performance regarding structure-only TEDS is much lower than that of Ablation 3 model.
6.3. Deformable convolution and spatial attention
6.4. Analysis of the generalization capacities
Table 10. Experimental results in the cross-dataset setting with structure-only TEDS score. means the tables without spanning cells and represents the tables with spanning cells.
| Training set | Testing set | TEDS-struc.(%) | ||
|---|---|---|---|---|
| Empty Cell | Empty Cell | Sim. | Com. | All |
| SciTSR | SciTSR | 98.59 | 97.88 | 98.41 |
| FinTabNet | 77.39 | 78.73 | 78.08 | |
| PubTables1M | 59.51 | 59.96 | 59.70 | |
| FinTabNet | SciTSR | 96.75 | 93.77 | 96.03 |
| FinTabNet | 99.08 | 99.02 | 99.05 | |
| PubTables1M | 76.78 | 76.78 | 76.78 | |
| PubTables1M | SciTSR | 91.02 | 93.19 | 91.54 |
| FinTabNet | 81.99 | 79.40 | 80.66 | |
| PubTables1M | 99.19 | 97.66 | 98.55 | |
6.5. Analysis of the failed cases
Fig. 12. A failed prediction example from the FinTabNet testing set.
Fig. 13. A failed prediction example from the FinTabNet testing set.
Fig. 14. A failed prediction example from the FinTabNet testing set.
Fig. A.15. A failed prediction example from the SciTSR testing set.
Fig. A.16. A failed prediction example from the SciTSR testing set.
Fig. A.17. A failed prediction example from the SciTSR testing set.
Fig. A.18. A failed prediction example from the PubTables1M testing set.
Fig. A.19. A failed prediction example from the PubTables1M testing set.
Fig. A.20. A failed prediction example from the PubTables1M testing set.
6.6. Other observations
6.7. Summary of insights
7. Conclusion and future work
CRediT authorship contribution statement
Declaration of competing interest
Acknowledgments
Appendix. More failed prediction cases
References
- Adiga et al., 2019Table structure recognition based on cell relationship, a bottom-up approachProceedings of the international conference on recent advances in natural language processing, RANLP 2019, INCOMA Ltd (2019), pp. 1-8
- Bacea and Oniga, 2023Single stage architecture for improved accuracy real-time object detection on mobile devicesImage and Vision Computing, 130 (2023), Article 104613
- Cai and Vasconcelos, 2018Cai, Z., & Vasconcelos, N. (2018). Cascade r-cnn: Delving into high quality object detection. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 6154–6162).
- Carion et al., 2020End-to-end object detection with transformersEuropean conference on computer vision, Springer (2020), pp. 213-229
- Chen et al., 2019MMDetection: Open mmlab detection toolbox and benchmark(2019)arXiv preprint arXiv:1906.07155
- Chen et al., 2023Enhanced training of query-based object detection via selective query recollectionProceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE (2023), pp. 23756-23765
- Chi et al., 2019Complicated table structure recognition(2019)arXiv preprint arXiv:1908.04729
- Chollet, 2017Xception: Deep learning with depthwise separable convolutionsProceedings of the IEEE conference on computer vision and pattern recognition, IEEE (2017), pp. 1251-1258
- Dai et al., 2017Deformable convolutional networksProceedings of the IEEE international conference on computer vision, IEEE (2017), pp. 764-773
- Deng et al., 2009Imagenet: A large-scale hierarchical image database2009 IEEE conference on computer vision and pattern recognition, IEEE (2009), pp. 248-255
- Ding et al., 2022Scaling up your kernels to 31x31: Revisiting large kernel design in cnnsProceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE (2022), pp. 11963-11975
- Fernandes et al., 2023Tablestrrec: framework for table structure recognition in data sheet imagesInternational Journal on Document Analysis and Recognition (IJDAR) (2023), pp. 1-19
- Guo et al., 2022Segnext: Rethinking convolutional attention design for semantic segmentationAdvances in Neural Information Processing Systems, 35 (2022), pp. 1140-1156
- Hashmi et al., 2021Guided table structure recognition through anchor optimizationIEEE Access, 9 (2021), pp. 113521-113534
- He et al., 2017Mask r-cnnProceedings of the IEEE international conference on computer vision, IEEE (2017), pp. 2961-2969
- He et al., 2016Deep residual learning for image recognitionProceedings of the IEEE conference on computer vision and pattern recognition, IEEE (2016), pp. 770-778
- Hong et al., 2022Dynamic sparse r-cnnProceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE (2022), pp. 4723-4732
- Howard et al., 2017Mobilenets: Efficient convolutional neural networks for mobile vision applications(2017)arXiv preprint arXiv:1704.04861
- Hu et al., 2021Touching text line segmentation combined local baseline and connected component for uchen tibetan historical documentsInformation Processing & Management, 58 (6) (2021), Article 102689
- Huang et al., 2023Improving table structure recognition with visual-alignment sequential coordinate modelingProceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE (2023), pp. 11134-11143
- Ioffe, 2015Batch normalization: Accelerating deep network training by reducing internal covariate shift(2015)arXiv preprint arXiv:1502.03167
- JaidedA, 2022Easyocr(2022)
- Krogh and Hertz, 1991A simple weight decay can improve generalizationAdvances in Neural Information Processing Systems, 4 (1991)
- Kuang et al., 2021Mmocr: A comprehensive toolbox for text detection, recognition and understanding(2021)arXiv preprint arXiv:2108.06543
- Li, Li et al., 2022Yolov6: A single-stage object detection framework for industrial applications(2022)arXiv preprint arXiv:2209.02976
- Li, Yin et al., 2022Table structure recognition and form parsing by end-to-end object detection and relation parsingPattern Recognition, 132 (2022), Article 108946
- Lin et al., 2017Feature pyramid networks for object detectionProceedings of the IEEE conference on computer vision and pattern recognition, IEEE (2017), pp. 2117-2125
- Lin et al., 2014Microsoft coco: Common objects in contextComputer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September (2014) 6-12, proceedings, Part V 13, Springer (2014), pp. 740-755
- Liu, Li et al., 2022Neural collaborative graph machines for table structure recognitionProceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE (2022), pp. 4533-4542
- Liu, Mao et al., 2022A convnet for the 2020sProceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE (2022), pp. 11976-11986
- Lu et al., 2021Master: Multi-aspect non-local network for scene text recognitionPattern Recognition, 117 (2021), Article 107980
- Ly and Takasu, 2023An end-to-end multi-task learning model for image-based table recognitionProceedings of the 18th international joint conference on computer vision, imaging and computer graphics theory and applications - volume 5: VISAPP, SciTePress (2023), pp. 626-634
- Ma et al., 2023Robust table detection and structure recognition from heterogeneous document imagesPattern Recognition, 133 (2023), Article 109006
- Mendes and Saraiva, 2017Tabula: A language to model spreadsheet tables(2017)arXiv preprint arXiv:1707.02833
- Mondal et al., 2023Dataset agnostic document object detectionPattern Recognition, 142 (2023), Article 109698
- Nassar et al., 2022Tableformer: Table structure understanding with transformersProceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE (2022), pp. 4614-4623
- Nguyen et al., 2023Formerge: Recover spanning cells in complex table structure using transformer networkInternational conference on document analysis and recognition, Springer (2023), pp. 522-534
- Pascanu, 2013On the difficulty of training recurrent neural networks(2013)arXiv preprint arXiv:1211.5063
- Prasad et al., 2020Cascadetabnet: An approach for end to end table detection and structure recognition from image-based documentsProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, IEEE (2020), pp. 572-573
- Qiao et al., 2021Lgpma: Complicated table structure recognition with local and global pyramid mask alignmentInternational conference on document analysis and recognition, Springer (2021), pp. 99-114
- Rastan et al., 2019Texus: A unified framework for extracting and understanding tables in pdf documentsInformation Processing & Management, 56 (3) (2019), pp. 895-918
- Ren et al., 2015Faster r-cnn: Towards real-time object detection with region proposal networksAdvances in Neural Information Processing Systems, 28 (2015)
- Ren et al., 2023Detrex: Benchmarking detection transformers(2023)arXiv preprint arXiv:2306.07265
- Schreiber et al., 2017Schreiber, S., Agne, S., Wolf, I., Dengel, A. R., & Ahmed, S. (2017). Deepdesrt: Deep learning for detection and structure recognition of tables in document images. Vol. 01, In 2017 14th IAPR international conference on document analysis and recognition (pp. 1162–1167).
- Shen et al., 2023Shen, H., Gao, X., Wei, J., Qiao, L., Zhou, Y., Li, Q., et al. (2023). Divide rows and conquer cells: Towards structure recognition for large tables. In Proceedings of the thirty-second international joint conference on artificial intelligence IJCAI-23, (pp. 1369–1377). International Joint Conferences on Artificial Intelligence Organization.
- Siddiqui et al., 2019Deeptabstr: Deep learning based table structure recognition2019 international conference on document analysis and recognition, ICDAR, IEEE (2019), pp. 1403-1409
- Siddiqui et al., 2018Decnt: Deep deformable cnn for table detectionIEEE Access, 6 (2018), pp. 74151-74161
- Singer-Vine, 2022Pdfplumber(2022)
- Smock and Pesala, 2021Table transformer(2021)
- Smock et al., 2022Pubtables-1 m: Towards comprehensive table extraction from unstructured documentsProceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE (2022), pp. 4634-4642
- Smock et al., 2023Aligning benchmark datasets for table structure recognitionFink G.A., Jain R., Kise K., Zanibbi R. (Eds.), Document analysis and recognition - ICDAR 2023, Springer Nature Switzerland, Cham (2023), pp. 371-386
- Sun, Jiang et al., 2021What makes for end-to-end object detection?International conference on machine learning, PMLR (2021), pp. 9934-9944
- Sun, Zhang et al., 2021Sparse r-cnn: End-to-end object detection with learnable proposalsProceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE (2021), pp. 14454-14463
- Tensmeyer et al., 2019Deep splitting and merging for table structure decomposition2019 international conference on document analysis and recognition, ICDAR, IEEE (2019), pp. 114-121
- Tian et al., 2019Fcos: Fully convolutional one-stage object detectionProceedings of the IEEE/CVF international conference on computer vision, IEEE (2019), pp. 9627-9636
- Vaswani et al., 2017Attention is all you needAdvances in Neural Information Processing Systems, 30 (2017)
- Wang, Bochkovskiy et al., 2023YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectorsProceedings of the IEEE/CVF conference on computer vision and pattern recognition, CVPR, IEEE (2023)
- Wang, Lin et al., 2023Robust table structure recognition with dynamic queries enhanced detection transformerPattern Recognition, 144 (2023), Article 109817
- Wu et al., 2019Detectron2(2019)
- Wu, Ma et al., 2023Drfn: A unified framework for complex document layout analysisInformation Processing & Management, 60 (3) (2023), Article 103339
- Wu, Xiao et al., 2023Cross-domain document layout analysis using document style guideExpert Systems with Applications (2023), Article 123039
- Xiao, Akkaya et al., 2022Efficient information sharing in ict supply chain social network via table structure recognitionGLOBECOM 2022-2022 IEEE global communications conference, IEEE (2022), pp. 4661-4666
- Xiao, Akkaya et al., 2023Multi-modal ocr system for the ict global supply chainICC 2023-IEEE international conference on communications, IEEE (2023), pp. 3096-3101
- Xiao, Simsek et al., 2022Handling big tabular data of ict supply chains: a multi-task, machine-interpretable approachGLOBECOM 2022-2022 IEEE global communications conference, IEEE (2022), pp. 504-509
- Xiao et al., 2023bRevisiting table detection datasets for visually rich documents(2023)arXiv preprint arXiv:2305.04833
- Xiao et al., 2023cTable detection for visually rich document imagesKnowledge-Based Systems, 282 (2023), Article 111080
- Xue et al., 2021Tgrnet: A table graph reconstruction network for table structure recognitionProceedings of the IEEE/CVF international conference on computer vision, IEEE (2021), pp. 1295-1304
- Ye et al., 2021Pingan-vcgroup’s solution for icdar 2021 competition on scientific literature parsing task b: Table recognition to html(2021)arXiv preprint arXiv:2105.01848
- Yu et al., 2023An effective method for figures and tables detection in academic literatureInformation Processing & Management, 60 (3) (2023), Article 103286
- Zhang et al., 2023Dense distinct query for end-to-end object detectionProceedings of the IEEE/CVF conference on computer vision and pattern recognition, IEEE (2023), pp. 7329-7338
- Zhang et al., 2022Split, embed and merge: An accurate table structure recognizerPattern Recognition, 126 (2022), Article 108565
- Zheng et al., 2021Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual contextProceedings of the IEEE/CVF winter conference on applications of computer vision, IEEE (2021), pp. 697-706
- Zhong et al., 2020Image-based table recognition: data, model, and evaluationEuropean conference on computer vision, Springer (2020), pp. 564-580
- Zhu et al., 2021Zhu, X., Su, W., Lu, L., Li, B., Wang, X., & Dai, J. (2021). Deformable detr: Deformable transformers for end-to-end object detection. In International conference on learning representations.
Cited by (4)
Hierarchical Structure Understanding in Complex Tables with VLLMs: a Benchmark and Experiments
2026, Lecture Notes in Computer ScienceTABLET: Table Structure Recognition Using Encoder-only Transformers
2026, Lecture Notes in Computer ScienceImproving Table Structure Recognition Based on Content-Based Post-Processing
2025, 2025 International Conference on Multimedia Analysis and Pattern Recognition Mapr 2025 Proceedings